--- title: IceVision Bboxes - Real Data keywords: fastai sidebar: home_sidebar nb_path: "nbs/IceVision-on-espiownage-cleaner.ipynb" ---
{% raw %}
{% endraw %}

This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- S.H. Hawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

{% raw %}
 
{% endraw %} {% raw %}
#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
{% endraw %} {% raw %}
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")

!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
!pip install mmdet -qq
TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
{% endraw %}

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

{% raw %}
from icevision.all import *
import pandas as pd
INFO     - The mmdet config folder already exists. No need to downloaded it. Path : /home/shawley/.icevision/mmdetection_configs/mmdetection_configs-2.10.0/configs | icevision.models.mmdet.download_configs:download_mmdet_configs:17
{% endraw %}

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

{% raw %}
#!rm -rf  /root/.icevision/data/espiownage-cyclegan
{% endraw %} {% raw %}
#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'

# OLD SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://hedges.belmont.edu/~shawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# espiownage cyclegan dataset: cyclegan is public for demo / reproducibility
#data_url = 'https://hedges.belmont.edu/~shawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

from pathlib import Path
data_dir = Path('/home/shawley/datasets/espiownage-cleaner') # real data is local and private.
{% endraw %}

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

{% raw %}
df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()
filename width height label xmin ymin xmax ymax
0 06240907_proc_00254.png 512 384 2 31 135 184 290
1 06240907_proc_00256.png 512 384 0 65 153 168 270
2 06240907_proc_00270.png 512 384 1 45 149 164 280
3 06240907_proc_00281.png 512 384 11 0 104 194 333
4 06240907_proc_00282.png 512 384 18 0 103 190 328
{% endraw %}

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

{% raw %}
df['label'] = 'A'  # antinode
df.head()
filename width height label xmin ymin xmax ymax
0 06240907_proc_00254.png 512 384 A 31 135 184 290
1 06240907_proc_00256.png 512 384 A 65 153 168 270
2 06240907_proc_00270.png 512 384 A 45 149 164 280
3 06240907_proc_00281.png 512 384 A 0 104 194 333
4 06240907_proc_00282.png 512 384 A 0 103 190 328
{% endraw %}

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

{% raw %}
template_record = ObjectDetectionRecord()
{% endraw %}

Now use the method generate_template that will print out all the necessary steps we have to implement.

{% raw %}
Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_filepath(<Union[str, Path]>)
        record.set_img_size(<ImgSize>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
{% endraw %}

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

  • __init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.

  • __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.

  • __len__: How many items will be iterating over.

  • imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.

  • parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

{% raw %}
# but currently not a priority!
class ChessParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        self.df['label'] = 'A'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])
{% endraw %}

Let's randomly split the data and parser with Parser.parse:

{% raw %}
parser = ChessParser(template_record, data_dir)
{% endraw %} {% raw %}
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:136
{% endraw %}

Let's take a look at one record:

{% raw %}
show_record(train_records[5], display_label=False, figsize=(14, 10))
{% endraw %} {% raw %}
train_records[0]
BaseRecord

common: 
	- Record ID: 1693
	- Filepath: /home/shawley/datasets/espiownage-cleaner/images/06241902_proc_01547.png
	- Img: None
	- Image size ImgSize(width=512, height=384)
detection: 
	- Class Map: <ClassMap: {'background': 0, 'A': 1}>
	- Labels: [1, 1]
	- BBoxes: [<BBox (xmin:141, ymin:70, xmax:288, ymax:217)>, <BBox (xmin:161, ymin:248, xmax:276, ymax:347)>]
{% endraw %}

Moving On...

Following the Getting Started "refrigerator" notebook...

{% raw %}
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
{% endraw %}

this next cell generates an error. ignore it and move on

{% raw %}
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
{% endraw %} {% raw %}
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
{% endraw %} {% raw %}
selection = 0


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.mmdet.models.retinanet' from '/home/shawley/envs/icevision/lib/python3.8/site-packages/icevision/models/mmdet/models/retinanet/__init__.py'>,
 <icevision.models.mmdet.models.retinanet.backbones.resnet_fpn.MMDetRetinanetBackboneConfig at 0x7f8ae0eb5ac0>,
 {})
{% endraw %} {% raw %}
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/builder.py:16: UserWarning: ``build_anchor_generator`` would be deprecated soon, please use ``build_prior_generator`` 
  warnings.warn(
Use load_from_local loader
The model and loaded state dict do not match exactly

size mismatch for bbox_head.retina_cls.weight: copying a param with shape torch.Size([720, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([9, 256, 3, 3]).
size mismatch for bbox_head.retina_cls.bias: copying a param with shape torch.Size([720]) from checkpoint, the shape in current model is torch.Size([9]).
{% endraw %} {% raw %}
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
{% endraw %} {% raw %}
model_type.show_batch(first(valid_dl), ncols=4)
{% endraw %} {% raw %}
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
{% endraw %} {% raw %}
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
{% endraw %} {% raw %}
learn.lr_find(end_lr=0.01)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
SuggestedLRs(lr_min=8.912509656511247e-05, lr_steep=0.00011220184387639165)
{% endraw %} {% raw %}
learn.fine_tune(60, 1e-4, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 0.647621 0.493189 0.480833 00:48
1 0.466960 0.406789 0.559516 00:44
epoch train_loss valid_loss COCOMetric time
0 0.387776 0.355897 0.583733 00:51
1 0.378959 0.343695 0.595500 00:50
2 0.368419 0.352401 0.581266 00:51
3 0.355132 0.325539 0.611997 00:50
4 0.339097 0.318930 0.615741 00:51
5 0.341662 0.328631 0.618624 00:50
6 0.337708 0.315357 0.608946 00:51
7 0.332005 0.315189 0.622811 00:50
8 0.329290 0.301727 0.629474 00:50
9 0.325461 0.299971 0.634093 00:50
10 0.324105 0.301378 0.639376 00:51
11 0.317925 0.296231 0.632952 00:51
12 0.306069 0.292967 0.639242 00:50
13 0.305219 0.291652 0.642819 00:50
14 0.305737 0.288571 0.640884 00:50
15 0.294238 0.295957 0.640852 00:50
16 0.307462 0.311011 0.645317 00:50
17 0.306659 0.285091 0.640818 00:51
18 0.298948 0.281545 0.654331 00:50
19 0.292347 0.285754 0.647780 00:51
20 0.296638 0.287395 0.627344 00:50
21 0.290526 0.285619 0.637781 00:50
22 0.285002 0.279661 0.647191 00:50
23 0.273406 0.281950 0.649871 00:51
24 0.278973 0.283559 0.641193 00:50
25 0.284989 0.279859 0.649793 00:50
26 0.285490 0.281466 0.647259 00:50
27 0.275190 0.275581 0.654604 00:50
28 0.270250 0.285612 0.641442 00:50
29 0.259137 0.278656 0.654145 00:50
30 0.266376 0.291407 0.637556 00:51
31 0.263960 0.280927 0.651738 00:50
32 0.259236 0.293220 0.617586 00:51
33 0.258383 0.286662 0.630540 00:50
34 0.261132 0.279760 0.645749 00:50
35 0.245475 0.280233 0.638567 00:51
36 0.256014 0.278904 0.646995 00:50
37 0.253262 0.278226 0.656582 00:50
38 0.243158 0.276069 0.651961 00:50
39 0.241100 0.285742 0.641785 00:50
40 0.238516 0.282179 0.645233 00:50
41 0.233589 0.283721 0.645018 00:51
42 0.239969 0.288624 0.644283 00:51
43 0.231322 0.281798 0.648788 00:50
44 0.234531 0.284028 0.644175 00:50
45 0.229073 0.290543 0.639429 00:51
46 0.232352 0.284980 0.642968 00:50
47 0.224711 0.281790 0.649709 00:51
48 0.233579 0.289134 0.647787 00:50
49 0.226735 0.285015 0.647962 00:51
50 0.224235 0.285133 0.650793 00:50
51 0.223406 0.288246 0.640462 00:51
52 0.216097 0.286968 0.646534 00:50
53 0.226346 0.287697 0.642398 00:51
54 0.221578 0.288052 0.642201 00:50
55 0.220476 0.287952 0.644567 00:51
56 0.222692 0.288945 0.642745 00:50
57 0.219724 0.288004 0.645102 00:51
58 0.228891 0.288568 0.644687 00:50
59 0.219850 0.288657 0.644054 00:50
{% endraw %} {% raw %}
model_type.show_results(model, valid_ds, detection_threshold=.5)
{% endraw %} {% raw %}
learn.save('iv_bbox_real')
learn.load('iv_bbox_real'); 
{% endraw %}

Predictions in bulk

Run through the whole dataset, do predictions on everything, write out bounding boxes, order by top losses

{% raw %}
learn.dls.bs
8
{% endraw %} {% raw %}
learn.predict()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-17c52fb00833> in <module>
----> 1 learn.predict()

TypeError: predict() missing 1 required positional argument: 'item'
{% endraw %}

Follow-up:

IceVision forum.